Faster version of the field selection merging validation #12

acceleratesage · 2019-11-06T10:26:18Z

The current OverlappingFieldsCanBeMerged validation has performance problems with a high number of fragments or selection sets. This implements a faster version of OverlappingFieldsCanBeMerged, that can deal with a high number of fragments or selection sets without a blowup in runtime. This PR does not remove the old validation yet but instead adds the new one as an additional option to use.

It also includes JMH for benchmarks.

Fixes: sangria-graphql#296

The algorithm is described in:
https://tech.xing.com/graphql-overlapping-fields-can-be-merged-fast-ea6e92e0a01

Below is an example benchmark graph showing the performance improvement for multiple fragments in the same selection set.

The PR has been migrated from the one in the original sangria repository.

To use the new validation rule you have to construct a custom query validator that includes it:

import sangria.validation.rules
import sangria.validation.{QueryValidator, RuleBasedQueryValidator, ValidationRule}
import sangria.execution.Executor

// Define the list of validation rules you want to use and replace
// OverlappingFieldsCanBeMerged with the new implementation.
val rules: List[ValidationRule] = {
  QueryValidator.allRules.collect {
    case _: rules.OverlappingFieldsCanBeMerged => new rules.experimental.OverlappingFieldsCanBeMerged
    case rule                                  => rule
  }
}

// Build a query validator based on the rules
val validator: QueryValidator = new RuleBasedQueryValidator(rules)

// Then pass the custom QueryValidator when executing the query
Executor.execute( ... queryValidator = queryValidator ...)

acceleratesage · 2019-11-07T09:49:35Z

The build is only failing for scala 2.11, probably due to a difference in type inference. Is scala 2.11 still required to be supported and shall we update this PR?

yanns · 2019-11-07T12:43:47Z

The build can also be flaky. I triggered it again.

yanns · 2019-11-07T12:49:20Z

Nope, we have a compilation on scala 2.11.
For the moment, we have not mentioned removing support for scala 2.11.
If you can fix that, it'd a great! ❤️

By the way, great article!
I have not yet token time to review your PR, but I very like having this article to understand your approach.

paulpdaniels · 2019-11-07T13:03:27Z

Yes removing 2.11 is a much bigger conversation. We have a large legacy system still on 2.11, so we still need it.

acceleratesage · 2019-11-08T14:34:57Z

This compiles on all the supported versions now and is ready for review.

acceleratesage · 2019-11-08T14:41:58Z

The latest build error seems to be unrelated to this PR, could you please retrigger the build?

acceleratesage · 2019-11-12T13:48:41Z

Unfortunately according to our benchmarks the version with the scala collections is quite a bit slower in the non-asymptotic case than the original one with mutable Java collections. What is your stance regarding this? I would prefer to go back to the Java collections and possibly use anonymous class callbacks instead of lambdas, as scala-2.11 doesn't support lambda SAM syntax yet.

yanns · 2019-11-12T15:02:54Z

Unfortunately according to our benchmarks the version with the scala collections is quite a bit slower in the non-asymptotic case than the original one with mutable Java collections. What is your stance regarding this? I would prefer to go back to the Java collections and possibly use anonymous class callbacks instead of lambdas, as scala-2.11 doesn't support lambda SAM syntax yet.

I trust your decision here.

acceleratesage · 2019-11-12T15:24:45Z

It seems to be mostly related to LinkedHashMap / LinkedHashSet performance in scala. I see there is also an open issue regarding their implementation here: scala/bug#11369

If I find the time, I will revert / refactor this tomorrow. In the meantime, is there anything else you would need to get this merged?

yanns · 2019-11-12T16:25:39Z

If I find the time, I will revert / refactor this tomorrow. In the meantime, is there anything else you would need to get this merged?

time... 😉

Implements a faster version of OverlappingFieldsCanBeMerged, that can deal with a high number of fragments or selection sets without a blowup in runtime. Also includes JMH for benchmarks. The implementation uses Java collections as: - The scala LinkedHashSet implementation is slow - Java util.ArrayList does not preallocate the array The anonymous class syntax for consumers and functions is necessary for compatiblity with scala 2.11, which does not support SAM lambda syntax yet. It can be refactored to use lambda, when scala 2.11 support is dropped. Refs: https://tech.xing.com/graphql-overlapping-fields-can-be-merged-fast-ea6e92e0a01

acceleratesage · 2019-11-13T13:32:11Z

I would again ask you to restart the build, as it is an ArrayOutOfBounds exception that does not seem related to the code. Regarding the changes, I reverted back to using Java collections and for scala 2.11 compatibility use anonymous classes instead of lambdas in the higher order functions.

Java collections seem currently the better choice for this, as:

The scala LinkedHashSet / Map implementation is quite a bit slower than the Java one
The java util.ArrayList implementation does not preallocate the array upon construction, while the scala one does, so when no elements are added there is less allocation with the Java version.

Thanks for taking the time to review this rather big PR!

yanns

General question: how can people use this new implementation?
Should we find an alternative name to OverlappingFieldsCanBeMergedFast?
Does it differ from OverlappingFieldsCanBeMerged "just" by its speed, and does it have a different behavior?

yanns · 2019-11-14T12:26:43Z

build.sbt

+classDirectory in Jmh := (classDirectory in Test).value
+dependencyClasspath in Jmh := (dependencyClasspath in Test).value
+compile in Jmh := (compile in Jmh).dependsOn(compile in Test).value
+run in Jmh := (run in Jmh).dependsOn(compile in Jmh).evaluated


I think it's better to put the benchmarks into a separated module

And then the JMH plugin could only be activated for the benchmarks.

Ex: https://github.com/circe/circe/blob/6d2e2dddc42c7b38f5ed35527d5caad8b33bff35/build.sbt#L476-L487

Makes sense, I will have a look at this.

src/main/scala/sangria/validation/rules/overlappingfields/OutputName.scala

yanns · 2019-11-14T12:34:47Z

src/test/scala/sangria/validation/rules/OverlappingFieldsCanBeMergedFastSpec.scala

+
+class OverlappingFieldsCanBeMergedFastSpec extends WordSpec with ValidationSupport {
+
+  override val defaultRule = Some(new OverlappingFieldsCanBeMergedFast)


General question:
if I try this tests with the current implementation (override val defaultRule = Some(new OverlappingFieldsCanBeMerged), I have some errors.
Is it expected?

While the rules do implement the same validation, they differ in how they report the errors, so they cannot use the same set of tests. See my answer to your general questions below for more on this.

acceleratesage · 2019-11-14T12:56:26Z

Hi Yann, these questions all make a lot of sense. I'll answer the general ones here:

The implementation does the same validation: everything that was invalid before should be invalid now and everything that was valid before should be valid now.
It differs in how it reports the validation errors. It cannot fully reproduce the level of detail that the original implementation has in its reporting with regard to the origin of the conflicting fields as it looses some of the original nesting/fragment information as a result of the optimization.
Coming up with another better name is fine :)
I did not replace the original implementation, as I would leave this decision to the maintainers and the community and did not want it to be an issue in getting it accepted. Currently this implementation can be used by configuring a custom set of validation rules and including this instead of the original one. The original one is still the default and I would consider replacing it a separate step.

yanns · 2019-11-14T13:21:33Z

Hi Yann, these questions all make a lot of sense. I'll answer the general ones here:
* The implementation does the same validation: everything that was invalid before should be invalid now and everything that was valid before should be valid now.

ok 👍

* It differs in how it reports the validation errors. It cannot fully reproduce the level of detail that the original implementation has in its reporting with regard to the origin of the conflicting fields as it looses some of the original nesting/fragment information as a result of the optimization.

* Coming up with another better name is fine :)

* I did not replace the original implementation, as I would leave this decision to the maintainers and the community and did not want it to be an issue in getting it accepted. Currently this implementation can be used by configuring a custom set of validation rules and including this instead of the original one. The original one is still the default and I would consider replacing it a separate step.

I very appreciate this.
Maybe you could add an example in the description of the PR about how to use it, to ease adoption. Then we could link that in the release notes.

…utName.scala Co-Authored-By: Yann Simon <yann.simon.fr@gmail.com>

acceleratesage · 2019-11-14T13:39:00Z

I added an example of how to use the new validation rule to the PR description.

acceleratesage · 2019-11-14T16:24:30Z

@yanns I have changed the sangria project setup to a multi-module build in a separate PR: #14 I think this way we will avoid merge problems, when master changes before this PR gets in, as changing the build setup moves all source files.

acceleratesage · 2019-11-21T07:09:26Z

@yanns Thanks for merging the other PR. I plan to update this one to the new structure next week.

…tion # Conflicts: # build.sbt

acceleratesage · 2019-11-27T10:24:31Z

I updated the PR to the new project structure. Please have another look.

yanns

The PR is looking good to me. Thx for all your efforts.
I just have a question about introducing an annotation for api that may change in the future.

yanns · 2019-11-27T10:28:38Z

modules/core/src/main/scala/sangria/validation/rules/OverlappingFieldsCanBeMergedFast.scala

+  *
+  * The algorithm is described in [[CachedCheck]].
+  */
+class OverlappingFieldsCanBeMergedFast extends ValidationRule {


What is your opinion on adding an annotation like akka is doing with@ApiMayChange so that we can rename it in the future?

I would be fine with that. Is there already an established convention how to signify that in sangria?

Ultimately I hope this can replace the previous rule, so the option to choose explicitly could also disappear at some point.

Thinking about this, I would probably prefer using packages for this type of information, as an annotation can go unnoticed by the client, while a package like "internal" or "experimental" has to be referenced in the import by a client. So anyone importing something from an "experimental" package states that he knows what he is doing.

I would be fine with that. Is there already an established convention how to signify that in sangria?

I don't think there is an existing convention.
I'd recommend to "copy" what others are doing, like in the akka project, unless someone has a better idea.

Ultimately I hope this can replace the previous rule, so the option to choose explicitly could also disappear at some point.

👍

What would be the expectation when using @ApiMayChange? Does it still obey semantic versioning?

I'd leave the decision here to you and the other maintainers. Maybe there is a way that we could circumvent the problem and come up with a better name? @ApiMayChange is something I am not particularly fond of, due to how it allows invisible exceptions to semantic versioning. A separately versioned contrib module from which code then is migrated to core could be a clearer solution. But to be honest I'd prefer to get this rule in without that overhead.

I'd leave the decision here to you and the other maintainers. Maybe there is a way that we could circumvent the problem and come up with a better name? @ApiMayChange is something I am not particularly fond of, due to how it allows invisible exceptions to semantic versioning. A separately versioned contrib module from which code then is migrated to core could be a clearer solution. But to be honest I'd prefer to get this rule in without that overhead.

I'm not the "maintainer", just giving some of my time to help.
I also think a separately versioned contrib module would be cleaner.
But giving the time we all can give, I'd suggest to go with the @ApiMayChange annotation for now so that we can merge this.

- introduce an experimental package and move the new rule there - rename OverlappingFieldsCanBeMergedFast to OverlappingFieldsCanBeMerged as the naming conflict is resolved through the new experimental package - add an @ApiMayChange annotation

yanns

Thx a lot!! 🏆

Can you please update the PR comment with the new package name?

acceleratesage · 2019-12-03T11:58:52Z

@yanns Done

yanns · 2019-12-03T12:00:38Z

What a great work (technically & also explaining with the blog post) 🥇

acceleratesage mentioned this pull request Nov 8, 2019

Add a faster version of the field selection merging validation sangria-graphql/sangria#458

Closed

acceleratesage force-pushed the fast-overlapping-fields-validation branch from d4fa9b9 to b3691b1 Compare November 13, 2019 13:16

yanns reviewed Nov 14, 2019

View reviewed changes

Update src/main/scala/sangria/validation/rules/overlappingfields/Outp…

4ff335b

…utName.scala Co-Authored-By: Yann Simon <yann.simon.fr@gmail.com>

acceleratesage added 2 commits November 27, 2019 10:45

Merge branch 'sangria-org/master' into fast-overlapping-fields-valida…

548304f

…tion # Conflicts: # build.sbt

Adapt branch to multi module project structure

5c4bb02

acceleratesage requested a review from yanns November 27, 2019 10:24

yanns reviewed Nov 27, 2019

View reviewed changes

acceleratesage requested a review from yanns December 3, 2019 10:53

yanns approved these changes Dec 3, 2019

View reviewed changes

yanns merged commit 491ef41 into sangria-graphql-org:master Dec 3, 2019

redhead mentioned this pull request Feb 7, 2020

OverlappingFieldsCanBeMerged is slow graphql-java/graphql-java#1786

Closed

redhead mentioned this pull request Apr 8, 2020

Experimental fast overlapping-fields-can-be-merged algorithm graphql-java/graphql-java#1854

Closed

paulpdaniels mentioned this pull request Jul 5, 2020

Implement missing query validations ghostdogpr/caliban#7

Closed

9 tasks

s1na mentioned this pull request Feb 16, 2022

Validation slows on high number of aliases graph-gophers/graphql-go#499

Closed

yanns mentioned this pull request Mar 2, 2023

validation: field selection merging apollographql/apollo-rs#306

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster version of the field selection merging validation #12

Faster version of the field selection merging validation #12

acceleratesage commented Nov 6, 2019 •

edited

Loading

acceleratesage commented Nov 7, 2019

yanns commented Nov 7, 2019

yanns commented Nov 7, 2019

paulpdaniels commented Nov 7, 2019

acceleratesage commented Nov 8, 2019

acceleratesage commented Nov 8, 2019

acceleratesage commented Nov 12, 2019

yanns commented Nov 12, 2019

acceleratesage commented Nov 12, 2019

yanns commented Nov 12, 2019

acceleratesage commented Nov 13, 2019 •

edited

Loading

yanns left a comment

yanns Nov 14, 2019

acceleratesage Nov 14, 2019

yanns Nov 14, 2019

acceleratesage Nov 14, 2019

acceleratesage commented Nov 14, 2019

yanns commented Nov 14, 2019

acceleratesage commented Nov 14, 2019

acceleratesage commented Nov 14, 2019

acceleratesage commented Nov 21, 2019

acceleratesage commented Nov 27, 2019

yanns left a comment

yanns Nov 27, 2019

acceleratesage Nov 27, 2019

acceleratesage Nov 27, 2019

yanns Nov 27, 2019

acceleratesage Nov 27, 2019

acceleratesage Nov 27, 2019 •

edited

Loading

yanns Dec 2, 2019

yanns left a comment

acceleratesage commented Dec 3, 2019

yanns commented Dec 3, 2019


		class OverlappingFieldsCanBeMergedFastSpec extends WordSpec with ValidationSupport {

		override val defaultRule = Some(new OverlappingFieldsCanBeMergedFast)

Faster version of the field selection merging validation #12

Faster version of the field selection merging validation #12

Conversation

acceleratesage commented Nov 6, 2019 • edited Loading

acceleratesage commented Nov 7, 2019

yanns commented Nov 7, 2019

yanns commented Nov 7, 2019

paulpdaniels commented Nov 7, 2019

acceleratesage commented Nov 8, 2019

acceleratesage commented Nov 8, 2019

acceleratesage commented Nov 12, 2019

yanns commented Nov 12, 2019

acceleratesage commented Nov 12, 2019

yanns commented Nov 12, 2019

acceleratesage commented Nov 13, 2019 • edited Loading

yanns left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acceleratesage commented Nov 14, 2019

yanns commented Nov 14, 2019

acceleratesage commented Nov 14, 2019

acceleratesage commented Nov 14, 2019

acceleratesage commented Nov 21, 2019

acceleratesage commented Nov 27, 2019

yanns left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acceleratesage Nov 27, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanns left a comment

Choose a reason for hiding this comment

acceleratesage commented Dec 3, 2019

yanns commented Dec 3, 2019

acceleratesage commented Nov 6, 2019 •

edited

Loading

acceleratesage commented Nov 13, 2019 •

edited

Loading

acceleratesage Nov 27, 2019 •

edited

Loading